Complex Predicates Annotation in a Corpus of Portuguese

نویسندگان

  • Iris Hendrickx
  • Amália Mendes
  • Sílvia Pereira
  • Anabela Gonçalves
  • Inês Duarte
چکیده

We present an annotation scheme for the annotation of complex predicates, understood as constructions with more than one lexical unit, each contributing part of the information normally associated with a single predicate. We discuss our annotation guidelines of four types of complex predicates, and the treatment of several difficult cases, related to ambiguity, overlap and coordination. We then discuss the process of marking up the Portuguese CINTIL corpus of 1M tokens (written and spoken) with a new layer of information regarding complex predicates. We also present the outcomes of the annotation work and statistics on the types of CPs that we found in the corpus.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identifying and Analyzing Brazilian Portuguese Complex Predicates

Semantic Role Labeling annotation task depends on the correct identification of predicates, before identifying arguments and assigning them role labels. However, most predicates are not constituted only by a verb: they constitute Complex Predicates (CPs) not yet available in a computational lexicon. In order to create a dictionary of CPs, this study employs a corpus-based methodology. Searches ...

متن کامل

The Interlanguage of Persian Learners of Italian: a Focus on Complex Predicates

This paper aims at investigating the acquisition of Italian complex predicates by native speakers of Persian. Complex predication is not as pervasive a phenomenon in Italian as it is in Persian. Yet Italian native speakers use complex predicates productively; spontaneous data show that Persian learners of Italian seem to be perfectly aware of Italian complex predicates and use this familiar fea...

متن کامل

Collective Elaboration of a Coreference Annotated Corpus for Portuguese Texts

This paper describes the collaborative creation of a corpus with coreference annotation for Portuguese. The annotation was performed using the coreference annotation CORP, and the editing tool CorrefVisual. The texts were automatically annotated and manually revised by Portuguese speakers. As a result a new corpus for coreference studies was produced for Portuguese.

متن کامل

Grammatical Annotation of Historical Portuguese: Generating a Corpus-Based Diachronic Dictionary

In this paper, we present an automatic system for the morphosyntactic annotation and lexicographical evaluation of historical Portuguese corpora. Using rule-based orthographical normalization, we were able to apply a standard parser (PALAVRAS) to historical data (Colonia corpus) and to achieve accurate annotation for both POS and syntax. By aligning original and standardized word forms, our met...

متن کامل

Morphological Annotation System for Automated Tagging of Electronic Textual Corpora: from English to Romance Languages

Based on the Penn-Helsinki Parsed Corpus of Middle English[1], the Tycho Brahe Parsed Corpus of Historical Portuguese[2] consists of an electronic annotated corpus composed of prose, originally written in Portuguese by native speakers of European Portuguese (henceforth EP) born between the 16th and 19th centuries. The present annotation system to be applied to Portuguese has been developed in t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010